Fully-Nested Interactive POMDPs for Partially-Observable Turn-Based Games
نویسنده
چکیده
Interactive POMDPs (I-POMDPs) are a useful framework for describing POMDPs that interact with other POMDPs. I-POMDPs are solved recursively in levels: a level1 I-POMDP assumes the opponent acts randomly, and a levelk I-POMDP assumes the opponent is a level-(k-1) I-POMDP. In this paper, we introduce fully-nested I-POMDPs, which are uncertain about the physical state of the game, the level of their opponent, and the opponent’s belief about both. This paper has three main contributions: it (1) introduces the framework for turn-based fully-nested I-POMDPs and shows how to reduce them to POMDPs; (2) motivates fully-nested I-POMDPs by introducing the game of partially-observable nim and solving it using SARSOP; and (3) shows empirically that increasing the level of a fully-nested I-POMDP does not become intractable for this game.
منابع مشابه
Anytime Point Based Approximations for Interactive POMDPs
Partially observable Markov decision processes (POMDPs) have been largely accepted as a rich-framework for planning and control problems. In settings where multiple agents interact POMDPs prove to be inadequate. The interactive partially observable Markov decision process (I-POMDP) is a new paradigm that extends POMDPs to multiagent settings. The added complexity of this model due to the modeli...
متن کاملGeneralized and bounded policy iteration for finitely-nested interactive POMDPs: scaling up
Policy iteration algorithms for partially observable Markov decision processes (POMDP) offer the benefits of quick convergence and the ability to operate directly on the solution, which usually takes the form of a finite state controller. However, the controller tends to grow quickly in size across iterations due to which its evaluation and improvement become costly. Bounded policy iteration pr...
متن کاملDecayed Markov Chain Monte Carlo for Interactive POMDPs
To act optimally in a partially observable, stochastic and multi-agent environment, an autonomous agent needs to maintain a belief of the world at any given time. An extension of partially observable Markov decision processes (POMDPs), called interactive POMDPs (I-POMDPs), provides a principled framework for planning and acting in such settings. I-POMDP augments the POMDP beliefs by including m...
متن کاملImproved Planning for Infinite-Horizon Interactive POMDPs using Probabilistic Inference (Extended Abstract)
We provide the first formalization of self-interested multiagent planning using expectation-maximization (EM). Our formalization in the context of infinite-horizon and finitely-nested interactivePOMDP (I-POMDP) is distinct from EM formulations for POMDPs and other multiagent planning frameworks. Specific to I-POMDPs, we exploit the graphical model structure and present a new approach based on b...
متن کاملOn the Difficulty of Achieving Equilibrium in Interactive POMDPs
We analyze the asymptotic behavior of agents engaged in an infinite horizon partially observable stochastic game as formalized by the interactive POMDP framework. We show that when agents’ initial beliefs satisfy a truth compatibility condition, their behavior converges to a subjective ǫ-equilibrium in a finite time, and subjective equilibrium in the limit. This result is a generalization of a ...
متن کامل